1 Example with fish data from NORS

In this example we are interested in exploring data from a specific data resource – Sjöprovfiskedatabasen NORS (Institutionen för akvatiska resurser, SLU). This data base has 2.8 M observations starting in the 1950’s.

As you may already know, SBDI is a collection of many biodiversity databases. We start by searching for the data resource we are interested in using the function pick_filter(). This is an interactive query guiding you through the many resources available to filtering your query (data resources, spatial layers, and curated species lists).

fq_str <- pick_filter("resource") 
## follow the instructions 

Follow the instruction. Your choices here would have been “in3” –> “dr20.” Your variable fq_str will now contain a string “data_resource_uid:dr20.”

## we cant do it interactive here so we force it
fq_str <- "data_resource_uid:dr20"

But we are not interested in the complete database, but on the last 10 years of data. for this we concatenate (add to a vector) another filter string. These will be treated as AND factors.

y1 <- 2012
y2 <- 2021
fq_str <- c(fq_str, paste0("year:[", y1, " TO ", y2,"]"))
## Note the square brackets are hard limits

For references on how to use the filters see documentation at docs.

Using the function occurrences() we can the query for the observations fulfilling our filter. If you haven’t specified that in the sbdi_config() before (see [Case 1: Example with fish data from NORS] Installation), you need to pass your email and the download reason.

library(SBDI4R)
xf <- occurrences(fq = fq_str,
                 email = "test@test.org", 
                 download_reason_id=10)

## Simply summarise all records by data source 
table(xf$data$dataResourceName)
#> 
#>                                                                                           
#>                                                                                        29 
#> SLU Aqua Institute of Freshwater Research National register of survey test-fishing - NORS 
#>                                                                                    187508

1.1 Plotting data on a map

You can quickly plot all the observations as a PDF file with the function ocurrence_plot(), one page per species:

occurrences_plot(x, "obsPlot.pdf", 
                 grouped=FALSE, 
                 taxon_level="species", 
                 pch='+')

Note that the plot is saved to a pdf file in the current working directory. You can find that with getwd().

1.1.0.1 Leaflet

There are many other ways of producing spatial plots in R. The leaflet package provides a simple method of producing browser-based maps with panning, zooming, and background layers:

library(leaflet)
# drop any records with missing lat/lon values
xfl <- xf$data[!is.na(xf$data$longitude) | !is.na(xf$data$latitude),] 
marker_colour <- rep("#d95f02", nrow(xfl))
# blank map, with imagery background
leaflet(width = "100%") %>% 
  addProviderTiles("Esri.WorldImagery") %>%
  # add markers
  addCircleMarkers(xfl$longitude, xfl$latitude,  
                   radius = 1, 
                   fillOpacity =.5, 
                   opacity = 1,
                   col=marker_colour,
                  clusterOptions = markerClusterOptions())

1.2 Temporal analysis

A quick summary over the years reveal a drop in number of records over time, and stops .

table(xf$data$year)
#> 
#>  2012  2013  2014  2015  2016  2017  2018  2019 
#> 69685 68470 35715  4425  2900  2019  2417  1906
hist(xf$data$year, 
     breaks = seq(y1, y2), 
     xlab = "Year", 
     main = "")

1.3 Species summary

In the same way we can summaries the number of observations for each species, by common or scientific name.

sppTab <- table(xf$data$commonName)
sppDF <- as.data.frame(sppTab)
colnames(sppDF)[1] <- "species"
sppDF
#>                                            species  Freq
#> 1                                                  19481
#> 2                          Alaska fourhorn sculpin   470
#> 3                                  Alpine bullhead   614
#> 4                                  American burbot  7982
#> 5                                         Aral asp   717
#> 6                                      Arctic char  4827
#> 7                                     Baltic vimba   304
#> 8                                            bleak 11355
#> 9                                       Blue bream   502
#> 10                                     brown trout  3827
#> 11                                        Bullhead    91
#> 12                                           carps  1161
#> 13                               common carp, carp    20
#> 14                                     Common dace   357
#> 15                                    crucian carp   695
#> 16                                             eel    79
#> 17                             European pike-perch  6791
#> 18                              European whitefish  8816
#> 19                                        grayling   706
#> 20                                         gudgeon   238
#> 21                                       ide, orfe  1194
#> 22                                 Miller's thumbs   361
#> 23                                          minnow  1975
#> 24                                           perch 28520
#> 25                                            pike 21319
#> 26                                            Pope 15933
#> 27                                   rainbow trout    94
#> 28                                           roach 24811
#> 29                                            rudd  6485
#> 30                                          salmon    35
#> 31                                           smelt  7118
#> 32                                    spined loach   850
#> 33                                           sprat    40
#> 34 ten-spined stickleback, nine-spined stickleback   128
#> 35                                           tench  4492
#> 36                                     white bream  5149
sppTab <- table(xf$data$scientificName)
sppDF <- as.data.frame(sppTab)
colnames(sppDF)[1] <- "species"
sppDF
#>                                         species  Freq
#> 1                Abramis brama (Linnaeus, 1758) 11616
#> 2            Alburnus alburnus (Linnaeus, 1758) 11355
#> 3            Anguilla anguilla (Linnaeus, 1758)    79
#> 4            Ballerus ballerus (Linnaeus, 1758)   502
#> 5              Blicca bjoerkna (Linnaeus, 1758)  5149
#> 6          Carassius carassius (Linnaeus, 1758)   695
#> 7                 Cobitis taenia Linnaeus, 1758   850
#> 8                       Coregonus albula albula  7865
#> 9               Coregonus maraena (Bloch, 1779)  8816
#> 10                  Cottus gobio Linnaeus, 1758    91
#> 11                        Cottus Linnaeus, 1758   361
#> 12               Cottus poecilopus Heckel, 1840   614
#> 13                                   Cyprinidae  1161
#> 14               Cyprinus carpio Linnaeus, 1758    20
#> 15                   Esox lucius Linnaeus, 1758 21319
#> 16                 Gobio gobio (Linnaeus, 1758)   238
#> 17        Gymnocephalus cernua (Linnaeus, 1758) 15933
#> 18            Leuciscus aspius (Linnaeus, 1758)   717
#> 19              Leuciscus idus (Linnaeus, 1758)  1194
#> 20         Leuciscus leuciscus (Linnaeus, 1758)   357
#> 21                   Lota lota (Linnaeus, 1758)  7982
#> 22  Myoxocephalus quadricornis (Linnaeus, 1758)   470
#> 23          Oncorhynchus mykiss (Walbaum, 1792)    94
#> 24           Osmerus eperlanus (Linnaeus, 1758)  7118
#> 25             Perca fluviatilis Linnaeus, 1758 28520
#> 26           Phoxinus phoxinus (Linnaeus, 1758)  1975
#> 27         Pungitius pungitius (Linnaeus, 1758)   128
#> 28             Rutilus rutilus (Linnaeus, 1758) 24811
#> 29                   Salmo salar Linnaeus, 1758    35
#> 30                  Salmo trutta Linnaeus, 1758  3827
#> 31          Salvelinus alpinus (Linnaeus, 1758)  4827
#> 32           Sander lucioperca (Linnaeus, 1758)  6791
#> 33 Scardinius erythrophthalmus (Linnaeus, 1758)  6485
#> 34           Sprattus sprattus (Linnaeus, 1758)    40
#> 35         Thymallus thymallus (Linnaeus, 1758)   706
#> 36                 Tinca tinca (Linnaeus, 1758)  4492
#> 37                 Vimba vimba (Linnaeus, 1758)   304

Perhaps, you need to send this table as a .CSV file to a colleague.

write.csv(sppDF, "NORS species summary.csv")
# NOTE: again this will be saved on your working directory

1.4 Spatial biodiversity analysis

Let’s now ask: how does the species richness vary across Sweden? In this case we want to summarise occurrences species-wise over a defined grid instead of plotting every observation point. First we need to overlay the observations with a grid. In this case, the standard Swedish grids at 50, 25, 10 and 5 km are provided as data in the SBDI4R package (with Coordinate Reference System = WGS84, EPSG:4326).

library(sp) # the function coordinates() and proj4string() are in sp
library(rgeos) # the function over() is in package rgeos
# load some shapes over Sweden's political borders
data("swe_wgs84", package="SBDI4R", envir=environment())
# A standard 50km grid
data("Sweden_Grid_50km_Wgs84", package="SBDI4R", envir=environment())

grid <- Sweden_Grid_50km_Wgs84
# grid <- spTransform(grid, CRS("+init=epsg:4326")) # as mentioned above, the grid already has has the same CRS, but changes are undergoing in the sp package and this step is needed

# make the observations spatial
# NOTE: make sure there are no NAs on either column defining the coordinates
# xf$data[!is.na(xf$data$longitude) | !is.na(xf$data$latitude),]

obs <- as.data.frame(xf$data)
coordinates(obs) <- obs[,c("longitude","latitude")]
wkt <- sf::st_crs(4326)[[2]]
proj4string(obs) <- sp::CRS(wkt) #CRS("+init=epsg:4326")

nObs <- nrow(obs)

# overlay the data with the grid
ObsInGridList <- over(grid, obs, returnList=TRUE)
wNonEmpty <- unname( which( unlist(lapply(ObsInGridList, nrow)) != 0) )
if(length(wNonEmpty)==0) message("Observations don't overlap any grid cell.")

The result ObsInGridList is a list object with a subset of the data on each grid. Now summarise occurrences within grid cells:

# check n the total number of observations
sum(unlist(lapply(ObsInGridList, nrow)))
#> [1] 187537

# apply a summary over the grid cells 
nCells <- length(ObsInGridList)

res <- data.frame("nObs"=as.numeric(rep(NA,nCells)),
                  "nYears"=as.numeric(rep(NA,nCells)),
                  "nSpp"=as.numeric(rep(NA,nCells)),
                  row.names = row.names(grid),
                  stringsAsFactors = FALSE)

cols2use <- c("scientificName", "year")

dataRes <- lapply(ObsInGridList[wNonEmpty], function(x){
  x <- x[,cols2use]
  colnames(x) <- c("scientificName", "year")
  return(c("nObs" = length(x[,"scientificName"]),
           "nYears" = length(unique(x[,"year"])),
           "nSpp" = length(unique(x[,"scientificName"]))
           ))
  })

dataRes <- as.data.frame(dplyr::bind_rows(dataRes, .id = "gridID"))

res[wNonEmpty,] <- dataRes[,-1]

resSp <- sp::SpatialPolygonsDataFrame(grid, res)

And finally plot the grid summary as a map:

palBW <- leaflet::colorNumeric(c("white", "navyblue"),
                               c(0, max(resSp@data$nSpp, na.rm = TRUE)),
                               na.color = "transparent")
oldpar <- par()
par(mar = c(1,1,0,0))
plot(resSp, col=palBW(resSp@data$nSpp), border = NA)
plot(swe_wgs84$Border, border=1, lwd=1, add=T)
legend("bottomleft", 
       legend = round(seq(0, max(resSp@data$nSpp, na.rm = TRUE), length.out = 5)),
       col = palBW(seq(0, max(resSp@data$nSpp, na.rm = TRUE), length.out = 5)),
       title = "Number of \nspecies", pch = 15, bty="n")
suppressWarnings(par(oldpar))